Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet to bypass Arrow #3

Merged
merged 10 commits into from
Aug 23, 2023

Conversation

whitphx
Copy link
Owner

@whitphx whitphx commented Aug 18, 2023

No description provided.

Copy link

@lukasmasuch lukasmasuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 This is awesome! It will unlock a lot of interesting features we released in the last couple of months 🙌

However there might be some incompatibilities with some of the less commonly used column types. You could potentially test this out by running some of the scripts we use for e2e testing, e.g.: st_data_editor_column_types.py or most of the st_arrow_... scripts

_LOGGER.info(
"Serialization of dataframe to Arrow table was unsuccessful due to: %s. "
"Serialization of dataframe to Parquet table was unsuccessful due to: %s. "
"Applying automatic fixes for column types to make the dataframe Arrow-compatible.",
ex,
)
df = fix_arrow_incompatible_column_types(df)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a note: parquet might have other incompatibilities compared to the arrow serialization, which might be good to add here once we have identified some column types that don't work.

@lukasmasuch
Copy link

I did some tests, and most of the common data/column types work fine. This includes: string, boolean, integer, float, datetime 👍 This will already cover most use cases. Unfortunately, some of the less common data types don't work: lists, date, time, interval, period. We might be able to get them working with a bit more debugging, or - as a fallback - add them to fix_arrow_incompatible_column_types in stlite to at least be able to view the data instead of getting an exception.

@whitphx whitphx merged commit 00a8991 into stlite-1.24.0 Aug 23, 2023
@whitphx whitphx deleted the feature/1.24-bypass-arrow-with-parquet branch August 23, 2023 10:34
@whitphx
Copy link
Owner Author

whitphx commented Aug 23, 2023

@lukasmasuch Thank you very much for such huge helps!

whitphx added a commit that referenced this pull request Jan 7, 2024
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Jan 8, 2024
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Jan 19, 2024
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Feb 7, 2024
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Feb 10, 2024
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Mar 12, 2024
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Mar 13, 2024
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Mar 13, 2024
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Mar 13, 2024
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Mar 13, 2024
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Mar 13, 2024
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Mar 13, 2024
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Mar 15, 2024
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Apr 3, 2024
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Apr 28, 2024
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Apr 28, 2024
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request May 24, 2024
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request May 25, 2024
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Jun 25, 2024
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Jun 25, 2024
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Jun 26, 2024
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Jun 27, 2024
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14)

* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Aug 5, 2024
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14)

* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Aug 27, 2024
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14)

* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Aug 27, 2024
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14)

* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Aug 27, 2024
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14)

* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Aug 27, 2024
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14)

* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Aug 27, 2024
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14), Modification for Streamlit 1.27 (#7)

* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Aug 30, 2024
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14), Modification for Streamlit 1.27 (#7)

* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Aug 30, 2024
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14), Modification for Streamlit 1.27 (#7)

* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Aug 30, 2024
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14), Modification for Streamlit 1.27 (#7)

* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Oct 4, 2024
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14), Modification for Streamlit 1.27 (#7)

* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Oct 4, 2024
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14), Modification for Streamlit 1.27 (#7)

* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Oct 4, 2024
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14), Modification for Streamlit 1.27 (#7)

* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Nov 13, 2024
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14), Modification for Streamlit 1.27 (#7)

* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Dec 11, 2024
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14), Modification for Streamlit 1.27 (#7)

* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Jan 7, 2025
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14), Modification for Streamlit 1.27 (#7)

* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Jan 10, 2025
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14), Modification for Streamlit 1.27 (#7), Fix Quiver.ts (#23)

* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
whitphx added a commit that referenced this pull request Jan 10, 2025
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14), Modification for Streamlit 1.27 (#7), Fix Quiver.ts (#23)

* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame

* Patch DataEditor not to use PyArrow

* Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable

* Fix comments

* Fix incompatibilities with some column types

* Change logic to handle lists from fastparquet

* Move the decoding above the string parsing

---------

Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants